S&P 500 Latest Stocks Analysis¶
=====================================
In this notebook, we will delve into the most recent 2019-2024 stocks from the index
TABLE OF CONTENTS
Data Sources¶
- Yahoo Finance
- S&P 500 data
Analysis of Most Traded Stocks¶
==================================
Key Analyses¶
- Volume Analysis
- Closing Price Analysis
Comparative Analysis of Top Tech Stocks¶
==========================================
- Selection of Top 6 Tech Stocks in Software-Infrastructure Industry by Market Cap
- Mean Price Analysis
- Volatility Analysis
- Risk-Return Analysis
- Moving Average Analysis
- Golden Cross and Death Cross Identification
Fundamental Analysis¶
=====================
- Institutional Ownership Analysis • Top 10 Institutional Holders for Various Related Stocks
- Earnings Per Share (EPS) Analysis
- Price-to-Earnings (PE) Ratio Analysis
- Piotroski F-Score Analysis
- • Profitability Metrics • Return on Assets (ROA) • Return on Equity (ROE)
- • Leverage Metrics • Long-term Debt to Equity Ratio • Interest Coverage Ratio
- • Operating Efficiency Metrics • Asset Turnover Ratio • Operating Cash Flow Margin
import pandas as pd
from datetime import datetime, timedelta
import warnings
import yfinance as yf
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from scipy import stats
import matplotlib.dates as mdates
from matplotlib.colors import LinearSegmentedColormap
from plotly import tools
import plotly.tools as tls
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode, plot, iplot
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
df1=pd.read_csv(r"C:\Users\PMLS\Downloads\sAND P TOP 5000 COAMPNIES NAMES\sp500_stocks.csv")
df1.head()
| Date | Symbol | Adj Close | Close | High | Low | Open | Volume | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2010-01-04 | MMM | 44.016724 | 69.414719 | 69.774246 | 69.122070 | 69.473244 | 3640265.0 |
| 1 | 2010-01-05 | MMM | 43.741020 | 68.979935 | 69.590302 | 68.311035 | 69.230766 | 3405012.0 |
| 2 | 2010-01-06 | MMM | 44.361343 | 69.958191 | 70.735786 | 69.824417 | 70.133781 | 6301126.0 |
| 3 | 2010-01-07 | MMM | 44.393166 | 70.008362 | 70.033447 | 68.662209 | 69.665550 | 5346240.0 |
| 4 | 2010-01-08 | MMM | 44.705982 | 70.501671 | 70.501671 | 69.648827 | 69.974915 | 4073337.0 |
Data Cleaning¶
df1['Date'] = pd.to_datetime(df1['Date']) # Converted to date time
now_time = datetime.now()
five_years_ago = now_time - timedelta(days=5*365)
df = df1[df1['Date'] >= five_years_ago]
df = df.reset_index(drop=True)
- Lets Check if Anyy of the Dates are Missing
df['Date'] = pd.to_datetime(df['Date'])
# Define the start and end dates
start_date = df['Date'].min() # Use the minimum date from the DataFrame
end_date = pd.to_datetime('2024-09-04')
# Generate a date range from start_date to end_date
full_date_range = pd.date_range(start=start_date, end=end_date)
# Extract the unique dates from the 'Date' column in df
available_dates = pd.to_datetime(df['Date'].unique())
# Find the missing dates by comparing the full range with available dates
missing_dates = full_date_range.difference(available_dates)
# Get the count of missing dates
missing_count = len(missing_dates)
# Print the count of missing dates
print(f"Count of Missing Dates: {missing_count}")
num_to_display = min(missing_count, 5) # Here I will only show the first 5 missing dates
for missing_date in missing_dates[:num_to_display]:
print(f"{missing_date.strftime('%d %B %Y')} is missing")
Count of Missing Dates: 557 19 October 2019 is missing 20 October 2019 is missing 26 October 2019 is missing 27 October 2019 is missing 02 November 2019 is missing
While This Number (592) May Seem Small Here,¶
Note: The period from September 9, 2019, to September 9, 2024, contains exactly 1,827 days.
- This actually means 32.4% of our data is missing, which is a significant portion that can greatly affect our results.
- To address this issue, we have written a script that will fetch the missing data for these specific dates directly from the YFinance API, based on the stock list we will be providing.
df=df.rename(columns={"Symbol":"ticks"})
df.columns = df.columns.str.lower()
-- Function to fetch the missing dates
def fill_missing_dates(df, tickers):
df['date'] = pd.to_datetime(df['date'], utc=False)
start_date = df['date'].min()
end_date = df['date'].max()
full_date_range = pd.date_range(start=start_date, end=end_date)
# Create a dictionary to store data for each ticker
ticker_data = {ticker: df[df['ticks'] == ticker] for ticker in tickers}
for ticker in tickers:
ticker_df = ticker_data[ticker]
# Find missing dates for this ticker
available_dates = pd.to_datetime(ticker_df['date'].unique())
missing_dates = full_date_range.difference(available_dates)
if len(missing_dates) > 0:
print(f"Fetching {len(missing_dates)} missing dates for {ticker}")
# Fetch missing data from Yahoo Finance
yf_ticker = yf.Ticker(ticker)
missing_data = yf_ticker.history(start=missing_dates.min(), end=missing_dates.max() + timedelta(days=1))
# Process the fetched data
missing_data = missing_data.reset_index()
missing_data = missing_data.rename(columns={
'Date': 'date',
'Open': 'open',
'High': 'high',
'Low': 'low',
'Close': 'close',
'Adj Close': 'adj close',
'Volume': 'volume'
})
missing_data['ticks'] = ticker
# Add any missing columns with NaN values
for col in df.columns:
if col not in missing_data.columns:
missing_data[col] = pd.NA
# Convert 'date' column to tz-naive (remove timezone info) for both dataframes
ticker_df['date'] = pd.to_datetime(ticker_df['date']).dt.tz_localize(None)
missing_data['date'] = pd.to_datetime(missing_data['date']).dt.tz_localize(None)
# Combine the missing data with the existing data
ticker_data[ticker] = pd.concat([ticker_df, missing_data]).sort_values('date')
# Combine all ticker data back into a single dataframe
updated_df = pd.concat(ticker_data.values()).sort_values(['ticks', 'date'])
updated_df = updated_df.reset_index(drop=True)
return updated_df
# This is how we will be using this Function
tickers = ['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']
updated_df = fill_missing_dates(df, tickers)
Fetching 557 missing dates for NVDA Fetching 557 missing dates for TSLA Fetching 557 missing dates for AAPL Fetching 557 missing dates for AMZN Fetching 557 missing dates for F Fetching 557 missing dates for AMD Fetching 557 missing dates for BAC Fetching 557 missing dates for T Fetching 557 missing dates for AAL Fetching 557 missing dates for CCL
Checking Our Dates again¶
updated_df.isnull().sum()
date 0 ticks 0 adj close 12240 close 0 high 0 low 0 open 0 volume 0 Dividends 12300 Stock Splits 12300 dtype: int64
- Now we have Now Data Missing , at least for the Columns we will be using
new_df = updated_df
mean_dict = {}
# find average of volume traded over a period of time using for loops
for key in new_df['ticks'].unique():
value = new_df[new_df['ticks'] == key ]['volume'].mean()
mean_dict[key]= value
print("Length of the mean of ticks dictionary:", len(mean_dict))
# convert dict to pandas dataframe
avaerage_s = pd.Series(mean_dict)
top10_s = avaerage_s.sort_values(ascending=False)[:10]
print("Top 10 company tickers with highest average traded stock volume:\n", top10_s.index)
Length of the mean of ticks dictionary: 10 Top 10 company tickers with highest average traded stock volume: Index(['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL'], dtype='object')
- F is Ford Motor Company
- T is AT&T Inc.
tickers = ['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']
updated_df = fill_missing_dates(df, tickers)
updated_df
Fetching 557 missing dates for NVDA Fetching 557 missing dates for TSLA Fetching 557 missing dates for AAPL Fetching 557 missing dates for AMZN Fetching 557 missing dates for F Fetching 557 missing dates for AMD Fetching 557 missing dates for BAC Fetching 557 missing dates for T Fetching 557 missing dates for AAL Fetching 557 missing dates for CCL
| date | ticks | adj close | close | high | low | open | volume | Dividends | Stock Splits | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-10-15 | AAL | 28.074911 | 28.270000 | 28.370001 | 27.280001 | 27.629999 | 7019200.0 | NaN | NaN |
| 1 | 2019-10-16 | AAL | 27.856426 | 28.049999 | 28.809999 | 28.030001 | 28.330000 | 6353300.0 | NaN | NaN |
| 2 | 2019-10-17 | AAL | 27.737255 | 27.930000 | 28.440001 | 27.850000 | 28.309999 | 6709600.0 | NaN | NaN |
| 3 | 2019-10-18 | AAL | 28.025253 | 28.219999 | 28.320000 | 27.709999 | 27.799999 | 5689100.0 | NaN | NaN |
| 4 | 2019-10-21 | AAL | NaN | 28.422495 | 28.710493 | 28.233806 | 28.313253 | 6746400.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 24535 | 2024-08-29 | TSLA | NaN | 206.279999 | 214.889999 | 205.970001 | 209.800003 | 62308800.0 | 0.0 | 0.0 |
| 24536 | 2024-08-30 | TSLA | 214.110001 | 214.110001 | 214.570007 | 207.029999 | 208.630005 | 63370600.0 | NaN | NaN |
| 24537 | 2024-08-30 | TSLA | NaN | 214.110001 | 214.570007 | 207.029999 | 208.630005 | 63370600.0 | 0.0 | 0.0 |
| 24538 | 2024-09-03 | TSLA | 210.600006 | 210.600006 | 219.899994 | 209.639999 | 215.259995 | 76500900.0 | NaN | NaN |
| 24539 | 2024-09-04 | TSLA | 219.410004 | 219.410004 | 222.220001 | 210.619995 | 210.759995 | 80181916.0 | NaN | NaN |
24540 rows × 10 columns
def subdataframe(df, tick):
# top 10 ticks
ticks = list(top10_s.index)
assert tick in ticks, """Stock tick does not belong to top 10 stocks by trade volume, please try any of these:\n
['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']"""
ndf = new_df[new_df['ticks'] == tick]
return ndf
sp500Cmp_info_df = pd.read_csv('C:/Users/PMLS/Downloads/sAND P TOP 5000 COAMPNIES NAMES/sp500_companies.csv')
Top10HigestTradeNVD=['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']
info_df=pd.DataFrame({'tick':Top10HigestTradeNVD})
info_df
| tick | |
|---|---|
| 0 | NVDA |
| 1 | TSLA |
| 2 | AAPL |
| 3 | AMZN |
| 4 | F |
| 5 | AMD |
| 6 | BAC |
| 7 | T |
| 8 | AAL |
| 9 | CCL |
Final_info = pd.merge(info_df, sp500Cmp_info_df, how='left', left_on='tick', right_on='Symbol')
Final_info =Final_info[['tick','Shortname','Sector','Industry']]
Final_info
| tick | Shortname | Sector | Industry | |
|---|---|---|---|---|
| 0 | NVDA | NVIDIA Corporation | Technology | Semiconductors |
| 1 | TSLA | Tesla, Inc. | Consumer Cyclical | Auto Manufacturers |
| 2 | AAPL | Apple Inc. | Technology | Consumer Electronics |
| 3 | AMZN | Amazon.com, Inc. | Consumer Cyclical | Internet Retail |
| 4 | F | Ford Motor Company | Consumer Cyclical | Auto Manufacturers |
| 5 | AMD | Advanced Micro Devices, Inc. | Technology | Semiconductors |
| 6 | BAC | Bank of America Corporation | Financial Services | Banks - Diversified |
| 7 | T | AT&T Inc. | Communication Services | Telecom Services |
| 8 | AAL | American Airlines Group, Inc. | Industrials | Airlines |
| 9 | CCL | Carnival Corporation | Consumer Cyclical | Travel Services |
# Update company dict for your top 10 companies
company_dict = {
'NVDA': 'NVIDIA',
'TSLA': 'Tesla',
'AAPL': 'Apple',
'AMZN': 'Amazon',
'F': 'Ford Motor Company',
'AMD': 'Advanced Micro Devices',
'BAC': 'Bank of America',
'T': 'AT&T',
'AAL': 'American Airlines',
'CCL': 'Carnival Corporation'
}
# Create sub-dataframes for each company
nvda_df = subdataframe(new_df, 'NVDA')
tsla_df = subdataframe(new_df, 'TSLA')
aapl_df = subdataframe(new_df, 'AAPL')
amzn_df = subdataframe(new_df, 'AMZN')
f_df = subdataframe(new_df, 'F')
amd_df = subdataframe(new_df, 'AMD')
bac_df = subdataframe(new_df, 'BAC')
t_df = subdataframe(new_df, 'T')
aal_df = subdataframe(new_df, 'AAL')
ccl_df = subdataframe(new_df, 'CCL')
# Define the function to calculate daily return and company name
def dailyfunc(df):
df['daily return'] = ((df['close'] - df['open']) / df['open']) * 100
df.style.format('{:.2f}%', subset='daily return')
df['daily_mean'] = (df['open'] + df['close'] + df['high'] + df['low']) / 4
df['co_name'] = company_dict[df['ticks'].unique()[0]]
return df
# Apply dailyfunc to each company's dataframe
nvda_df = dailyfunc(nvda_df)
tsla_df = dailyfunc(tsla_df)
aapl_df = dailyfunc(aapl_df)
amzn_df = dailyfunc(amzn_df)
f_df = dailyfunc(f_df)
amd_df = dailyfunc(amd_df)
bac_df = dailyfunc(bac_df)
t_df = dailyfunc(t_df)
aal_df = dailyfunc(aal_df)
ccl_df = dailyfunc(ccl_df)
# Print the start and end date for each company One more for Confioatnos
print('\t\tStart Date\t\t\t\t\tEnd Date')
print(f"NVDA\t\t{nvda_df['date'].min()}\t\t\t{nvda_df['date'].max()}")
print(f"TSLA\t\t{tsla_df['date'].min()}\t\t\t{tsla_df['date'].max()}")
print(f"AAPL\t\t{aapl_df['date'].min()}\t\t\t{aapl_df['date'].max()}")
print(f"AMZN\t\t{amzn_df['date'].min()}\t\t\t{amzn_df['date'].max()}")
print(f"F\t\t{f_df['date'].min()}\t\t\t{f_df['date'].max()}")
print(f"AMD\t\t{amd_df['date'].min()}\t\t\t{amd_df['date'].max()}")
print(f"BAC\t\t{bac_df['date'].min()}\t\t\t{bac_df['date'].max()}")
print(f"T\t\t{t_df['date'].min()}\t\t\t{t_df['date'].max()}")
print(f"AAL\t\t{aal_df['date'].min()}\t\t\t{aal_df['date'].max()}")
print(f"CCL\t\t{ccl_df['date'].min()}\t\t\t{ccl_df['date'].max()}")
Start Date End Date NVDA 2010-01-04 2024-09-04 TSLA 2010-01-04 2024-09-04 AAPL 2010-01-04 2024-09-04 AMZN 2010-01-04 2024-09-04 F 2010-01-04 2024-09-04 AMD 2010-01-04 2024-09-04 BAC 2010-01-04 2024-09-04 T 2010-01-04 2024-09-04 AAL 2010-01-04 2024-09-04 CCL 2010-01-04 2024-09-04
- Here Our Start Data and End Data for all the Stocks are Correct , Ensuring Data Validation
Technical Analysis¶
- Lets Start from Finding The All time High Price of Our Most Traded Stocks
def plot_closing_stock_prices(dfs, ncols=2):
# Calculate the number of rows needed
nrows = (len(dfs) + ncols - 1) // ncols
# Create the figure and axes for the subplots
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15, 5 * nrows), facecolor='#f4f4f4')
fig.subplots_adjust(hspace=0.4, wspace=0.3)
for ax, df in zip(axes.flat, dfs):
# Calculate the highest stock price and the corresponding date
high = df['close'].max()
datetime = df[df['close'] == high]['date'].values[0]
# Define color based on the stock ticker
if df['ticks'].unique()[0] in ['GE', 'F']:
facecolor = '#ed615c'
else:
facecolor = '#4bd659'
# Plot the data
ax.plot(df['date'], df['close'], color='#0f2113')
ax.set_title(f"{df['co_name'].unique()[0]} Stock Price", fontsize=16, fontweight='bold')
ax.set_xlabel("Date", fontsize=12)
ax.set_ylabel("Daily Closing Stock Price", fontsize=12)
ax.set_facecolor('#ffffff') # Set background color for each subplot
# Add annotation for the highest stock price
ax.annotate(
f"All time high price during\nfive year period\nwas ${high:.2f}",
xy=(datetime, high),
xytext=(datetime, high - 0.1 * high),
bbox=dict(boxstyle="round", facecolor='#f5d3bf', edgecolor='#d0d5db'),
arrowprops=dict(facecolor='#f0190a', headlength=25, shrink=0.1)
)
# Hide any unused subplots
for ax in axes.flat[len(dfs):]:
ax.set_visible(False)
plt.show()
# Call the function with your dataframes
plot_closing_stock_prices([nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df])
Trade Volume Analysis¶
Look at the exact numbers of volume of how much these stocks were traded
- Identify when the trade volume was highest and at what value
def create_trade_volume_grid_single_row(dataframes):
# Set the number of columns to be equal to the number of dataframes
cols = len(dataframes)
# Create subplot grid with 1 row and as many columns as there are dataframes
fig = make_subplots(rows=1, cols=cols, subplot_titles=[df['co_name'].unique()[0] for df in dataframes])
for i, df in enumerate(dataframes, 1):
# All plots will be on row 1, so just adjust the column
row = 1
col = i
# Calculate statistics
ave_y = df['volume'].mean()
max_y = df['volume'].max()
min_y = df['volume'].min()
max_date = df[df['volume'] == max_y]['date'].values[0]
# Add trace for volume
fig.add_trace(
go.Scatter(x=df['date'], y=df['volume'], name='Volume', line=dict(color='#00CED1')),
row=row, col=col
)
# Add horizontal lines for max, min, and average
fig.add_hline(y=max_y, line_dash="dash", line_color="red", row=row, col=col)
fig.add_hline(y=min_y, line_dash="dash", line_color="green", row=row, col=col)
fig.add_hline(y=ave_y, line_dash="dash", line_color="yellow", row=row, col=col)
# Add vertical line for max volume date
fig.add_vline(x=max_date, line_dash="dash", line_color="purple", row=row, col=col)
# Update axes
fig.update_xaxes(title_text="Date", row=row, col=col)
fig.update_yaxes(title_text="Volume", row=row, col=col)
# Update layout
fig.update_layout(height=300, width=400*cols, title_text="Stock Trade Volumes",
showlegend=False, template="plotly_dark")
return fig
# Example usage
dataframes = [nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df]
fig = create_trade_volume_grid_single_row(dataframes)
fig.show()
list_df = [nvda_df, tsla_df, aapl_df, amzn_df, f_df, amd_df, bac_df, t_df, aal_df, ccl_df]
# loop through the the list_df to find mini and maxi of each stocks
mini = [df[df['date'] == df['date'].min()]['close'].values.item() for df in list_df]
maxi = [df[df['date'] == df['date'].max()]['close'].values.item() for df in list_df]
# find list of abosolute difference between both stock price
diff = np.array(maxi) - np.array(mini)
# find the percentage growth
growth = (diff/mini)*100
growth_list = growth.tolist()
co_name_list = [df['co_name'].unique()[0] for df in list_df]
# visualize the growth of the stocks
fig, ax = plt.subplots(figsize=(13,7))
ax.barh(y=co_name_list, width=growth_list, height=0.9, color=['#4bd659','#4bd659','#4bd659','#4bd659','#4bd659',
'#4bd659','#4bd659','#ed615c','#ed615c','#ed615c'],
edgecolor='#713ae8')
for p in ax.patches:
ax.annotate(f'{round(p.get_width(),2)}%', (p.get_width()+15, p.get_y() +0.3))
ax.set_xlabel('Percentage growth in stock price')
ax.set_ylabel('Name of companies')
ax.set_title("Growth in stock price over a period of 5 years")
plt.show()
Comparitive Analysis of of Top Tech Stocks in the Index¶
Now we are going to Move onto form our Most traded stocks To A comparitive Analysis of Top Stocks in the Tech Sectory ,
- To do this , Firstly we filtred out Top 5 tech stocks from the Orignal Dataframe Software - Infrastructure industry
# function to return top 10 sub dataframe
def subdataframe(df, tick):
# top 10 ticks
ticks = list(top10_s.index)
assert tick in ticks, """Stock tick does not belong to top 10 stocks by trade volume, please try any of these:\n
['NVDA', 'TSLA', 'AAPL', 'AMZN', 'F', 'AMD', 'BAC', 'T', 'AAL', 'CCL']"""
ndf = new_df[new_df['ticks'] == tick]
return ndf
ab = sp500Cmp_info_df[sp500Cmp_info_df['Industry'] == 'Software - Infrastructure']
ab=ab.sort_values(by='Marketcap',ascending=False)
ab=ab.head(6)
ab
| Exchange | Symbol | Shortname | Longname | Sector | Industry | Currentprice | Marketcap | Ebitda | Revenuegrowth | City | State | Country | Fulltimeemployees | Longbusinesssummary | Weight | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | NMS | MSFT | Microsoft Corporation | Microsoft Corporation | Technology | Software - Infrastructure | 408.90 | 3039369887744 | 1.294330e+11 | 0.152 | Redmond | WA | United States | 228000.0 | Microsoft Corporation develops and supports so... | 0.059346 |
| 20 | NYQ | ORCL | Oracle Corporation | Oracle Corporation | Technology | Software - Infrastructure | 140.75 | 387887300608 | 2.122700e+10 | 0.033 | Austin | TX | United States | 159000.0 | Oracle Corporation offers products and service... | 0.007574 |
| 28 | NMS | ADBE | Adobe Inc. | Adobe Inc. | Technology | Software - Infrastructure | 575.25 | 255065849856 | 7.844000e+09 | 0.102 | San Jose | CA | United States | 29945.0 | Adobe Inc., together with its subsidiaries, op... | 0.004980 |
| 87 | NMS | PANW | Palo Alto Networks, Inc. | Palo Alto Networks, Inc. | Technology | Software - Infrastructure | 346.15 | 112533364736 | 9.601666e+08 | 0.121 | Santa Clara | CA | United States | NaN | Palo Alto Networks, Inc. provides cybersecurit... | 0.002197 |
| 130 | NMS | SNPS | Synopsys, Inc. | Synopsys, Inc. | Technology | Software - Infrastructure | 477.53 | 73355296768 | 1.652390e+09 | 0.127 | Sunnyvale | CA | United States | 20300.0 | Synopsys, Inc. provides electronic design auto... | 0.001432 |
| 149 | NMS | CRWD | CrowdStrike Holdings, Inc. | CrowdStrike Holdings, Inc. | Technology | Software - Infrastructure | 259.32 | 63566077952 | 1.874900e+08 | 0.317 | Austin | TX | United States | 9219.0 | CrowdStrike Holdings, Inc. provides cybersecur... | 0.001241 |
new_df=pd.read_csv(r"C:\Users\PMLS\Downloads\sAND P TOP 5000 COAMPNIES NAMES\sp500_stocks.csv")
new_df=new_df.rename(columns={"Symbol":"ticks"})
new_df.columns = new_df.columns.str.lower()
company_dict = {
'MSFT': 'Microsoft',
'ORCL': 'Oracle Corporation',
'ADBE': 'Adobe Inc.',
'PANW': 'Palo Alto Networks',
'SNPS': 'Synopsys Inc.',
'CRWD': 'CrowdStrike Holdings'
}
def subdataframe(df, tick, *ticks):
ticks_list = list(ticks)
assert tick in ticks_list, f"Stock tick does not belong to the provided list of ticks: {ticks_list}"
ndf = df[df['ticks'] == tick]
return ndf
TechStocks = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']
# Create individual DataFrames for each stock
MSFT_df = subdataframe(new_df, 'MSFT', *TechStocks)
ORCL_df = subdataframe(new_df, 'ORCL', *TechStocks)
ADBE_df = subdataframe(new_df, 'ADBE', *TechStocks)
PANW_df = subdataframe(new_df, 'PANW', *TechStocks)
SNPS_df = subdataframe(new_df, 'SNPS', *TechStocks)
CRWD_df = subdataframe(new_df, 'CRWD', *TechStocks)
# Apply the dailyfunc function to each DataFrame
MSFT_df = dailyfunc(MSFT_df)
ORCL_df = dailyfunc(ORCL_df)
ADBE_df = dailyfunc(ADBE_df)
PANW_df = dailyfunc(PANW_df)
SNPS_df = dailyfunc(SNPS_df)
CRWD_df = dailyfunc(CRWD_df)
Mean Price of Each Stock¶
background_gradient = LinearSegmentedColormap.from_list("", ["#212121", "#1A1D23", "#03055B"])
line_colors = ['#FFFF00', '#00BFFF', '#32CD32', '#FF00FF', '#FFA500', '#d62728']
# Create the figure and axis with dark background
fig, ax = plt.subplots(figsize=(14, 7))
fig.patch.set_facecolor('#1A1D23') # Set figure background color to match theme
ax.set_facecolor('#212121') # Set axis background color
# Plot each stock's daily mean price with distinct colors and line widths for visibility on dark theme
ax.plot(MSFT_df['date'], MSFT_df['daily_mean'], label='Microsoft (MSFT)', color=line_colors[0], lw=1.4)
ax.plot(ORCL_df['date'], ORCL_df['daily_mean'], label='Oracle (ORCL)', color=line_colors[1], lw=1.4)
ax.plot(ADBE_df['date'], ADBE_df['daily_mean'], label='Adobe (ADBE)', color=line_colors[2], lw=1.4)
ax.plot(PANW_df['date'], PANW_df['daily_mean'], label='Palo Alto Networks (PANW)', color=line_colors[3], lw=1.4)
ax.plot(SNPS_df['date'], SNPS_df['daily_mean'], label='Synopsys (SNPS)', color=line_colors[4], lw=1.4)
ax.plot(CRWD_df['date'], CRWD_df['daily_mean'], label='CrowdStrike (CRWD)', color=line_colors[5], lw=1.4)
# Format the x-axis to display years only
ax.xaxis.set_major_locator(mdates.YearLocator()) # Place tick at the start of each year
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) # Format to show only the year
plt.xticks(rotation=0, color='white') # Make tick labels white for dark background
# Add title and labels with improved formatting
ax.set_title("Comparative Analysis of Tech Stock Prices Based on Mean Price", fontsize=16, fontweight='bold', color='white')
ax.set_ylabel("Daily Average Stock Price", fontsize=14, color='white')
# Add legend with a clear background
ax.legend(facecolor='#1A1D23', fontsize="medium", title="Tech Stocks", title_fontsize=13, labelcolor='white', edgecolor='white')
# Add gridlines for better readability, change to a softer color to match the dark theme
ax.grid(True, color='#444444')
# Show the plot
plt.tight_layout()
plt.show()
Daily Mean Foruma = (Open + Close + High + Low) / 4
Volitatliy Analysis¶
dataframes = [MSFT_df, ORCL_df, ADBE_df, PANW_df, SNPS_df, CRWD_df]
labels = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']
# Set up the plot
plt.figure(figsize=(14, 8))
# Loop through each DataFrame to plot only KDE (no histograms)
for df, label in zip(dataframes, labels):
# Plot only KDE for each DataFrame's 'daily return' with a specific color palette for differentiation
sns.kdeplot(df['daily return'], label=label, linewidth=2)
# Add labels and title
plt.title('KDE of Daily Returns for Top 6 Stocks')
plt.xlabel('Daily Return')
plt.ylabel('Density')
plt.legend(title='Stock')
# Show the plot
plt.show()
This Standardized chart shows the Variations , Here CRWD seems to the most Volatile amoung them ,
- There is a better way to shwo this with the Box and Whisker Plot , so we could get the idea of the Outliers better
dataframes = [MSFT_df, ORCL_df, ADBE_df, PANW_df, SNPS_df, CRWD_df]
labels = ['MSFT', 'ORCL', 'ADBE', 'PANW', 'SNPS', 'CRWD']
# Create a combined DataFrame for plotting
combined_df = pd.DataFrame()
# Loop through dataframes and calculate standard deviation of 'daily return'
std_devs = {}
for df, label in zip(dataframes, labels):
# Create a temporary DataFrame with a 'Stock' column
temp_df = df[['daily return']].copy()
temp_df['Stock'] = label
# Append to the combined DataFrame
combined_df = pd.concat([combined_df, temp_df])
# Calculate the standard deviation for this stock and store it
std_devs[label] = temp_df['daily return'].std()
# Sort stocks by their standard deviation (volatility)
sorted_labels = sorted(std_devs, key=std_devs.get)
# Set up the plot
plt.figure(figsize=(12, 6))
# Create the box plot, sorted by volatility
sns.boxplot(x='Stock', y='daily return', data=combined_df, order=sorted_labels, palette="Set2")
# Add labels and title
plt.title('Box Plot of Daily Returns for Top 6 Stocks (Sorted by Volatility)')
plt.ylabel('Daily Return')
# Show the plot
plt.show()
Risk Analysis¶
How much value do we put at risk by investing in a particular stock?
Risk analysis is performed by plotting the expected return (mean) against the risk (standard deviation) to show Volatility for each stock in a scatter plot.
means = []
std_devs = []
# Loop through dataframes and calculate mean and standard deviation for each stock
for df in dataframes:
mean_return = df['daily return'].mean()
std_dev = df['daily return'].std()
means.append(mean_return)
std_devs.append(std_dev)
# Define size and color for scatter plot
area = np.pi * 20 # Size of the scatter plot marker
colors = ['blue', 'green', 'red', 'purple', 'orange', 'brown'] # Colors for each stock
# Create a scatter plot
plt.figure(figsize=(12, 8))
# Plot each stock with a different color and annotate
for mean, std, label, color in zip(means, std_devs, labels, colors):
plt.scatter(mean, std, s=area, c=color, label=label)
plt.annotate(label, xy=(mean, std), xytext=(50, 50), textcoords='offset points',
ha='right', va='bottom',
arrowprops=dict(arrowstyle='-', color=color, connectionstyle='arc3,rad=-0.3'))
# Add labels, title, and legend
plt.xlabel('Expected Return (Mean)')
plt.ylabel('Risk (Standard Deviation)')
plt.title('Scatter Plot of Risk vs Expected Return for Top 6 Stocks')
# Show the plot
plt.show()
Moving Average¶
A very basic Measure to smothen out the Noise is using moving averages
- it's speciality Lies in using it's Golden Cross and Death Cross features in it
So we are going to use comapative comaises and comapre them with each other now
dataframes = [PANW_df, CRWD_df]
labels = ['PANW', 'CRWD']
PANW_df['date'] = pd.to_datetime(PANW_df['date'])
CRWD_df['date'] = pd.to_datetime(CRWD_df['date'])
PANW_df.set_index('date', inplace=True)
CRWD_df.set_index('date', inplace=True)
short_window = 50
long_window = 200
# Loop through companies and calculate moving averages
for company, label in zip([PANW_df, CRWD_df], ['PANW', 'CRWD']):
company['Short MA'] = company['adj close'].rolling(window=short_window).mean()
company['Long MA'] = company['adj close'].rolling(window=long_window).mean()
# Detect Golden Cross and Death Cross
company['Signal'] = 0.0 # Initialize signal column
company['Signal'] = np.where(company['Short MA'] > company['Long MA'], 1.0, 0.0)
company['Cross'] = company['Signal'].diff()
# Function to plot stock with Golden Cross and Death Cross
def plot_stock_with_cross(company, name, ax):
ax.plot(company['adj close'], label='Adjusted Close', alpha=0.6)
ax.plot(company['Short MA'], label=f'{short_window}-day MA', alpha=0.7)
ax.plot(company['Long MA'], label=f'{long_window}-day MA', alpha=0.7)
# Mark Golden Cross (1) and Death Cross (-1)
ax.plot(company[company['Cross'] == 1].index, company['Short MA'][company['Cross'] == 1], '^', markersize=10, color='g', lw=0, label='Golden Cross')
ax.plot(company[company['Cross'] == -1].index, company['Short MA'][company['Cross'] == -1], 'v', markersize=10, color='r', lw=0, label='Death Cross')
ax.set_title(name)
ax.legend()
# Set major ticks to every year (from 2019 to 2024)
ax.xaxis.set_major_locator(mdates.YearLocator()) # Every year
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) # Format as 'Year'
# Ensure x-axis spans from 2019 to 2024
ax.set_xlim([pd.Timestamp('2019-01-01'), pd.Timestamp('2024-12-31')])
# Set up plot with subplots for PANW and CRWD
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 6))
# Plot for PANW and CRWD
plot_stock_with_cross(PANW_df, 'PANW', axes[0])
plot_stock_with_cross(CRWD_df, 'CRWD', axes[1])
plt.tight_layout()
plt.show()
- CrowdStrike had faced 2 Death cross one of them is in Year 2024 , I think we have in idea why and when it did that happen
-- We will Analyze the most Recent Event that could Have caused the strike
check = CRWD_df[CRWD_df['Cross'] == -1]
check
| ticks | adj close | close | high | low | open | volume | daily return | daily_mean | co_name | Short MA | Long MA | Signal | Cross | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||
| 2022-01-04 | CRWD | 189.190002 | 189.190002 | 197.445007 | 183.320007 | 195.889999 | 6503400.0 | -3.420285 | 191.461254 | CrowdStrike Holdings | 235.3460 | 236.554925 | 0.0 | -1.0 |
| 2024-09-03 | CRWD | 266.600006 | 266.600006 | 277.510010 | 265.290009 | 275.779999 | 5485800.0 | -3.328738 | 271.295006 | CrowdStrike Holdings | 299.4642 | 300.198974 | 0.0 | -1.0 |
- It looks like , Our Data is working as Crowdstrike did indeed had a death cross ,
- Full Article

Market and Equity Analysis of Stocks¶
symbols = ['GOOGL', 'MSFT','WMT','AMZN','BABA','ORCL','SAP','CRM']
holders_data = []
for symbol in symbols:
stock = yf.Ticker(symbol)
try:
institutional_holders = stock.institutional_holders
holders_data.append({
'Stock Symbol': symbol,
'Top 10 Institutional Holders': institutional_holders.head(10)
})
except:
holders_data.append({
'Stock Symbol': symbol,
'Top 10 Institutional Holders': None
})
Holders_data_df = pd.DataFrame(holders_data)
Holders_data_df.set_index('Stock Symbol', inplace=True)
Holders_data_df
| Top 10 Institutional Holders | |
|---|---|
| Stock Symbol | |
| GOOGL | Date Reported Ho... |
| MSFT | Date Reported Ho... |
| WMT | Date Reported Ho... |
| AMZN | Date Reported Ho... |
| BABA | Date Reported H... |
| ORCL | Date Reported ... |
| SAP | Date Reported Hold... |
| CRM | Date Reported Holder... |
Equity ownership analysis¶
By Top 10 Institutional StakeHolders
from plotly.subplots import make_subplots
def create_interactive_holdings_pie_chart(df):
# Create a 4x2 subplot grid
fig = make_subplots(rows=4, cols=2, subplot_titles=df.index, specs=[[{'type':'domain'}, {'type':'domain'}]] * 4)
for i, (symbol, data) in enumerate(df.iterrows(), 1):
row = (i - 1) // 2 + 1
col = (i - 1) % 2 + 1
if data['Top 10 Institutional Holders'] is not None:
holders = data['Top 10 Institutional Holders']
# Create a pie chart for each stock
fig.add_trace(
go.Pie(
labels=holders['Holder'],
values=holders['Shares'],
name=symbol,
textposition='inside',
textinfo='percent+label',
hoverinfo='label+value+percent',
textfont_size=8
),
row=row, col=col
)
# Update layout
fig.update_layout(
height=2000,
width=1400,
title_text="Top 10 Institutional Holders for Various Stocks",
showlegend=False
)
return fig
# Assuming Holders_data_df is your DataFrame with the data
# Create the plot
fig = create_interactive_holdings_pie_chart(Holders_data_df)
# Show the plot
fig.show()
- it's quite intresting to note , How these Big investment firms Owns , Very Similar Shares in Both Tech Giants
Next we are Going to do some Analysis , Selecting do a Side by Side Comparision on Some Direct Alternates of Each Other as¶
| S&P Index Comp | alternate 1 | alternate 2 | Stock Symbols |
|---|---|---|---|
| Google (Alphabet) | Microsoft | GOOG/GOOGL, MSFT | |
| Walmart | Amazon | Alibaba | WMT, AMZN, BABA |
| Oracle | SAP | Salesforce | ORCL, SAP, CRM |
symbols = ['GOOGL', 'MSFT','WMT','AMZN','BABA','ORCL','SAP','CRM']
Historic EPS and PE Ratio¶
Obtaining current EPS and PE ratio data from YFinance is straightforward, but finding a way to get it for the last 12 months has been a significant challenge in my research.
- Either the sources are paid, such as Quandl, Zack, or Alpha Vantage API
- Or, web scraping is not allowed, leading to forbidden errors
My Approach¶
- Initially, I planned to scrape the data from the Macrotrends Website using pages like this
- However, scraping is not allowed, and I encountered a forbidden error , if You Know how to tackle the Fobidden warning , thean you case use the similar Code Below
- As a worst-case scenario, you can save the tables in a text file and read them manually to extract the data
import pandas as pd
STOCK_URLS = {
'AAPL': 'https://www.macrotrends.net/stocks/charts/AAPL/apple/pe-ratio',
'AMD': 'https://www.macrotrends.net/stocks/charts/AMD/amd/pe-ratio',
'NVDA': 'https://www.macrotrends.net/stocks/charts/NVDA/nvidia/pe-ratio',
'MSFT': 'https://www.macrotrends.net/stocks/charts/MSFT/microsoft/pe-ratio'
}
# Initialize empty DataFrames
dfs = {stock: pd.DataFrame() for stock in STOCK_URLS.keys()}
# Loop through each stock and URL
for stock, url in STOCK_URLS.items():
data = pd.read_html(url, skiprows=1)
df = pd.DataFrame(data[0])
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
df = df[1:]
df = df.rename(columns={0: 'Date', 1: 'Price', 2:'EPS', 3:'PE ratio'})
df['EPS'][1:] = ''
df.set_index('Date', inplace=True)
df = df.sort_index()
df['trend'] = ''
df['PE ratio'] = df['PE ratio'].astype(float)
file_paths = {
'AMD': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\AMD PE Ratio 2010-2024.txt",
'Apple': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\Apple PE Ratio 2010-2024.txt",
'Microsoft': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\Microsoft PE Ratio 2010-2024.txt",
'NVIDIA': r"C:\Users\PMLS\Desktop\MicroTrends _Stock_info\NVIDIA PE Ratio 2010-2024.txt"
}
def clean_eps(value):
if pd.isna(value) or value == '':
return None
return float(value.replace('$', ''))
dfs = {}
for stock, file_path in file_paths.items():
df = pd.read_csv(file_path, sep=r'\s+', header=None, names=['Date', 'Price', 'EPS', 'PE ratio'])
df['EPS'] = df['EPS'].apply(clean_eps)
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
df['PE ratio'] = pd.to_numeric(df['PE ratio'], errors='coerce')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df = df.iloc[1:]
df.sort_index(inplace=True, ascending=False)
dfs[stock] = df
# Extract the EPS columns from the DataFrames
eps_values = [df['EPS'] for df in dfs.values()]
fig = go.Figure()
# Add lines for each stock
for stock, eps in zip(dfs.keys(), eps_values):
fig.add_trace(go.Scatter(x=eps.index, y=eps.values, name=stock, line=dict(width=2)))
# Set title and labels
fig.update_layout(
title='EPS Over Time',
xaxis_title='Date',
yaxis_title='EPS',
plot_bgcolor='#333333',
paper_bgcolor='#333333',
font_color='#ffffff'
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
# Set vibrant line colors
fig.update_traces(marker=dict(color=['#ff69b4', '#33cc33', '#6666ff', '#ffcc00']))
# Show the legend
fig.update_layout(showlegend=True)
fig.update_layout(
width=600, # Set the figure width (in pixels)
height=300, # Set the figure height (in pixels)
)
# Show the plot
fig.show()
- Microsoft Apple Continues
pe_ratios = [df['PE ratio'] for df in dfs.values()]
stock_symbols = list(dfs.keys())
# Create the figure
fig = go.Figure()
# Add lines for all stocks, using the stock symbols from the dictionary keys
for i, stock in enumerate(stock_symbols):
fig.add_trace(go.Scatter(x=pe_ratios[i].index, y=pe_ratios[i].values, name=stock, line=dict(width=2), visible=(i < 2)))
# Set title and labels
fig.update_layout(
title='PE Ratio Over Time',
xaxis_title='Date',
yaxis_title='PE Ratio',
plot_bgcolor='#333333',
paper_bgcolor='#333333',
font_color='#ffffff'
)
# Set vibrant line colors
fig.update_traces(marker=dict(color=['#ff69b4', '#33cc33', '#6666ff', '#ffcc00']))
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
# Add dropdown menu
buttons = [
dict(label=f'{stock_symbols[0]} & {stock_symbols[3]}', method='update', args=[{'visible': [True, False, False, True]}]),
dict(label=f'{stock_symbols[1]} & {stock_symbols[2]}', method='update', args=[{'visible': [False, True, True, False]}]),
dict(label='All 4', method='update', args=[{'visible': [True, True, True, True]}])
]
fig.update_layout(updatemenus=[dict(buttons=buttons, direction='down', showactive=True)])
# Show the legend
fig.update_layout(showlegend=True)
# Show the plot
fig.show()
PE Ratio Insights¶
The ideal PE Ratio is difficult to define in the tech industry, but on average, it falls between 15-16. For mature companies, it can be larger, ranging from 20-25. [1]
From the chart we can see that
- AMD: In September 2023, AMD had a remarkably All time high PE Ratio of 856.83, accompanied by significant volatility.
- Nvidia: Nvidia maintained a relatively less variable PE Ratio, which, although still larger than the average, showed more stability.
[1] Source: What is a Good PE Ratio?
Fundamental Analysis with Piotroski F-Score¶
============================================= This script performs fundamental analysis on a list of custom companies using a modified Piotroski F-Score.
Script Description¶
Our robust script utilizes a variation of the Piotroski F-Score, updating the original metrics to incorporate modern financial data and analysis techniques. Piotroski Score
tickers = {
'Cybersecurity': ['PANW', 'CRWD', 'FTNT'],
'E-commerce': ['ETSY', 'SHOP', 'W'],
'Fintech': ['SQ', 'PYPL', 'V'],
'Software': ['TEAM', 'ZM', 'MSFT'],
'Gaming': ['EA', 'TTWO', 'AIY.DE'],
'Automotive': ['RIVN', 'MBLY', 'VFS','TSLA']
}
def get_data(ticker):
try:
ticker_obj = yf.Ticker(ticker)
info = ticker_obj.info
financials = ticker_obj.financials
balance_sheet = ticker_obj.balance_sheet
cash_flow = ticker_obj.cashflow
if financials.empty or balance_sheet.empty or cash_flow.empty:
print(f"Warning: Missing data for {ticker}")
return None, None, None, None, None
return ticker_obj, info, financials, balance_sheet, cash_flow
except Exception as e:
print(f"Error fetching data for {ticker}: {str(e)}")
return None, None, None, None, None
def calculate_pe(info):
pe = info.get('trailingPE', 0) or 0
return min(max(pe, 0), 100) # Cap PE ratio between 0 and 100
def get_value(df, possible_names, year, ticker):
for name in possible_names:
if name in df.index and year in df.columns:
return df.loc[name, year]
print(f"Warning: Could not find any of {possible_names} in dataframe for year {year} for {ticker}")
return 0
def calculate_profitability(financials, balance_sheet, cash_flow, ticker):
try:
years = financials.columns
if len(years) < 2:
return 0
latest_year, previous_year = years[0], years[1]
net_income = get_value(financials, ['Net Income', 'Net Income Common Stockholders'], latest_year, ticker)
net_income_py = get_value(financials, ['Net Income', 'Net Income Common Stockholders'], previous_year, ticker)
op_cf = get_value(cash_flow, ['Operating Cash Flow', 'Cash Flow from Operations'], latest_year, ticker)
total_assets = get_value(balance_sheet, ['Total Assets'], latest_year, ticker)
total_assets_py = get_value(balance_sheet, ['Total Assets'], previous_year, ticker)
avg_assets = (total_assets + total_assets_py) / 2 if total_assets and total_assets_py else total_assets or total_assets_py or 1
RoA = net_income / avg_assets if avg_assets else 0
RoA_py = net_income_py / avg_assets if avg_assets else 0
accruals = op_cf / total_assets - RoA if total_assets else 0
#
ni_score = min(net_income / 1e9, 2) if net_income > 0 else 0 # Cap at 2 for net income > $2B
ni_growth_score = min((net_income - net_income_py) / abs(net_income_py) if net_income_py else 0, 1)
op_cf_score = min(op_cf / 1e9, 2) if op_cf > 0 else 0 # Cap at 2 for operating cash flow > $2B
roa_score = min(RoA * 10, 2) # Cap at 2 for ROA > 20%
roa_growth_score = min((RoA - RoA_py) * 5, 1) # Cap at 1 for ROA growth > 20%
accruals_score = min(accruals * 5, 1) if accruals > 0 else 0
return ni_score + ni_growth_score + op_cf_score + roa_score + roa_growth_score + accruals_score
except Exception as e:
print(f"Error calculating profitability for {ticker}: {str(e)}")
return 0
def calculate_leverage(balance_sheet, ticker):
try:
latest_year = balance_sheet.columns[0]
lt_debt = get_value(balance_sheet, ['Long Term Debt', 'Total Long Term Debt', 'Net Debt'], latest_year, ticker)
total_assets = get_value(balance_sheet, ['Total Assets'], latest_year, ticker)
current_assets = get_value(balance_sheet, ['Total Current Assets', 'Current Assets'], latest_year, ticker)
current_liab = get_value(balance_sheet, ['Total Current Liabilities', 'Current Liabilities'], latest_year, ticker)
debt_ratio = lt_debt / total_assets if total_assets else 0
current_ratio = current_assets / current_liab if current_liab else 0
# More granular scoring
debt_ratio_score = max(0, 2 - debt_ratio * 5) # 0 score for debt_ratio > 0.4, max 2 for debt_ratio = 0
current_ratio_score = min(current_ratio - 1, 2) if current_ratio > 1 else 0 # Max 2 for current_ratio > 3
return debt_ratio_score + current_ratio_score
except Exception as e:
print(f"Error calculating leverage for {ticker}: {str(e)}")
return 0
def calculate_operating_efficiency(financials, balance_sheet, ticker):
try:
years = financials.columns
if len(years) < 2:
return 0
latest_year, previous_year = years[0], years[1]
gp = get_value(financials, ['Gross Profit'], latest_year, ticker)
gp_py = get_value(financials, ['Gross Profit'], previous_year, ticker)
revenue = get_value(financials, ['Total Revenue', 'Revenue'], latest_year, ticker)
revenue_py = get_value(financials, ['Total Revenue', 'Revenue'], previous_year, ticker)
total_assets = get_value(balance_sheet, ['Total Assets'], latest_year, ticker)
total_assets_py = get_value(balance_sheet, ['Total Assets'], previous_year, ticker)
gm = gp / revenue if revenue else 0
gm_py = gp_py / revenue_py if revenue_py else 0
avg_assets = (total_assets + total_assets_py) / 2 if total_assets and total_assets_py else total_assets or total_assets_py or 1
at = revenue / avg_assets if avg_assets else 0
at_py = revenue_py / avg_assets if avg_assets else 0
# More granular scoring
gm_score = min(gm * 2, 2) # Max 2 for gross margin > 50%
gm_growth_score = min((gm - gm_py) * 10, 1) # Max 1 for 10% growth in gross margin
at_score = min(at, 2) # Max 2 for asset turnover >= 2
at_growth_score = min((at - at_py) * 5, 1) # Max 1 for 20% growth in asset turnover
return gm_score + gm_growth_score + at_score + at_growth_score
except Exception as e:
print(f"Error calculating operating efficiency for {ticker}: {str(e)}")
return 0
summary = pd.DataFrame(columns=['Industry', 'Ticker', 'Company Name', 'PE Ratio', 'Profitability', 'Leverage', 'Operating Efficiency'])
for industry, symbols in tickers.items():
for ticker in symbols:
ticker_obj, info, financials, balance_sheet, cash_flow = get_data(ticker)
if ticker_obj is None or info is None or financials is None or balance_sheet is None or cash_flow is None:
print(f"Skipping {ticker} due to missing data")
continue
pe_ratio = calculate_pe(info)
profitability = calculate_profitability(financials, balance_sheet, cash_flow, ticker)
leverage = calculate_leverage(balance_sheet, ticker)
operating_efficiency = calculate_operating_efficiency(financials, balance_sheet, ticker)
# Fetch the company name from the info dictionary
company_name = info.get('longName', ticker)
new_row = {
'Industry': industry,
'Ticker': ticker,
'Company Name': company_name,
'PE Ratio': pe_ratio,
'Profitability': profitability,
'Leverage': leverage,
'Operating Efficiency': operating_efficiency
}
summary = summary._append(new_row, ignore_index=True)
print(f"{ticker} ({company_name}) added.")
# Calculate Total Score with weights
weights = {'Profitability': 0.4, 'Leverage': 0.3, 'Operating Efficiency': 0.3}
summary['Total Score'] = (summary['Profitability'] * weights['Profitability'] +
summary['Leverage'] * weights['Leverage'] +
summary['Operating Efficiency'] * weights['Operating Efficiency'])
# Normalize scores
for column in ['Profitability', 'Leverage', 'Operating Efficiency', 'Total Score']:
max_score = summary[column].max()
if max_score > 0:
summary[column] = summary[column] / max_score * 10
# Sort by Total Score
summary = summary.sort_values(by='Total Score', ascending=False)
# Save to CSV
summary.to_csv('Summary_with_names.csv', index=False)
print("Summary file with company names added.")
summary
PANW (Palo Alto Networks, Inc.) added. CRWD (CrowdStrike Holdings, Inc.) added. FTNT (Fortinet, Inc.) added. ETSY (Etsy, Inc.) added. SHOP (Shopify Inc.) added. W (Wayfair Inc.) added. SQ (Block, Inc.) added. PYPL (PayPal Holdings, Inc.) added. V (Visa Inc.) added. TEAM (Atlassian Corporation) added. Warning: Could not find any of ['Long Term Debt', 'Total Long Term Debt', 'Net Debt'] in dataframe for year 2024-01-31 00:00:00 for ZM ZM (Zoom Video Communications, Inc.) added. MSFT (Microsoft Corporation) added. EA (Electronic Arts Inc.) added. TTWO (Take-Two Interactive Software, Inc.) added. AIY.DE (Activision Blizzard Inc) added. RIVN (Rivian Automotive, Inc.) added. Warning: Could not find any of ['Long Term Debt', 'Total Long Term Debt', 'Net Debt'] in dataframe for year 2023-12-31 00:00:00 for MBLY MBLY (Mobileye Global Inc.) added. VFS (VinFast Auto Ltd.) added. TSLA (Tesla, Inc.) added. Summary file with company names added.
| Industry | Ticker | Company Name | PE Ratio | Profitability | Leverage | Operating Efficiency | Total Score | |
|---|---|---|---|---|---|---|---|---|
| 11 | Software | MSFT | Microsoft Corporation | 35.341260 | 9.050168 | 4.645531 | 6.968310 | 10.0 |
| 10 | Software | ZM | Zoom Video Communications, Inc. | 25.242857 | 6.530246 | 10.0 | 6.540146 | 9.676711 |
| 2 | Cybersecurity | FTNT | Fortinet, Inc. | 49.035503 | 8.106195 | 3.768044 | 9.122103 | 9.594741 |
| 0 | Cybersecurity | PANW | Palo Alto Networks, Inc. | 51.263737 | 10.000000 | 0.0 | 7.313343 | 9.355254 |
| 8 | Fintech | V | Visa Inc. | 29.747322 | 8.931555 | 3.302906 | 6.186266 | 9.28916 |
| 18 | Automotive | TSLA | Tesla, Inc. | 61.352116 | 8.225444 | 6.500295 | 4.307100 | 9.261078 |
| 12 | Gaming | EA | Electronic Arts Inc. | 33.703530 | 7.479406 | 4.183104 | 6.801571 | 8.646036 |
| 4 | E-commerce | SHOP | Shopify Inc. | 84.080800 | 4.947972 | 8.986636 | 6.936685 | 8.291835 |
| 3 | E-commerce | ETSY | Etsy, Inc. | 23.111626 | 6.829372 | 3.023663 | 7.820137 | 8.072221 |
| 7 | Fintech | PYPL | PayPal Holdings, Inc. | 19.541264 | 7.557439 | 4.255449 | 3.026382 | 7.73083 |
| 1 | Cybersecurity | CRWD | CrowdStrike Holdings, Inc. | 100.000000 | 4.793815 | 5.512961 | 8.633508 | 7.547236 |
| 14 | Gaming | AIY.DE | Activision Blizzard Inc | 0.000000 | 4.930864 | 8.351623 | 3.335013 | 7.132296 |
| 9 | Software | TEAM | Atlassian Corporation | 0.000000 | 3.321770 | 3.082009 | 10.000000 | 6.060256 |
| 16 | Automotive | MBLY | Mobileye Global Inc. | 0.000000 | 1.671184 | 10.0 | 3.858088 | 5.360779 |
| 6 | Fintech | SQ | Block, Inc. | 64.537030 | 1.686429 | 5.426277 | 5.973935 | 4.511236 |
| 5 | E-commerce | W | Wayfair Inc. | 0.000000 | 0.755730 | 0.0 | 7.544339 | 2.550857 |
| 15 | Automotive | RIVN | Rivian Automotive, Inc. | 0.000000 | -3.399105 | 6.698802 | 3.356351 | 0.43888 |
| 13 | Gaming | TTWO | Take-Two Interactive Software, Inc. | 0.000000 | -6.862910 | 1.870831 | 3.346543 | -3.633858 |
| 17 | Automotive | VFS | VinFast Auto Ltd. | 0.000000 | -6.735669 | 1.908362 | 2.351831 | -3.790046 |
- It is Prominent that many of these top Companies , are Pass out most the Metrics we have set easily as they are proftable and are known Industry Leaders so maybe we should be using such Analysis on small to mid size companies